Jitter noise modeling and its removal using recursive least squares in shape from focus systems

Three-dimensional shape recovery from the set of 2D images has many applications in computer vision and related fields. Passive techniques of 3D shape recovery utilize a single view point and one of these techniques is Shape from Focus or SFF. In SFF systems, a stack of images is taken with a single camera by manipulating its focus settings. During the image acquisition, the inter-frame distance or the sampling step size is predetermined and assumed constant. However, in a practical situation, this step size cannot remain constant due to mechanical vibrations of the translational stage, causing jitter. This jitter produces Jitter noise in the resulting focus curves. Jitter noise is invisible in every image, because all images in the stack are exposed to the same error in focus; thus, limiting the use of traditional noise removal techniques. This manuscript formulates a model of Jitter noise based on Quadratic function and the Taylor series. The proposed method, then, solves the jittering problem for SFF systems through recursive least squares (RLS) filtering. Different noise levels were considered during experiments performed on both real as well as simulated objects. A new metric measure is also proposed, referred to as depth distortion (DD), which calculates the number of pixels contributing to the RMSE in percentage. The proposed measure is used along with the RMSE and correlation, to compute and test the reconstructed shape quality. The results confirm the effectiveness of the proposed scheme.

In recent years, extensive research has been conducted towards the recovery of 3D information from its corresponding 2D images. Given a set of images, important depth information can be effectively obtained and further used for 3D reconstruction. This information is used in various applications, such as robotic manipulation, automatic inspection, medical imaging, microscopy, consumer cameras, bioinformatics etc. [1][2][3][4][5][6] Shape from Focus (SFF) is the process of reconstructing the depth of the scene by actively changing the camera optics until the point of interest is in focus. It is one of the passive techniques that uses one camera for 3D shape reconstruction. The SFF system must record a large sequence of image frames of the object/scene that correspond to different levels of the object/scene in focus 7 . The camera optics can be changed by changing the lens position or the object position relative to the camera. The depth of the focused point can be obtained through the thin lens Gaussian law: wherein f denotes the focal length of the imaging device. The distance of the object point from the imaging device is given by u, and the position of the object point where it is best focused by the lens is represented by v, and is given in Fig. 1.
Images can be obtained by manipulating any of the above-mentioned factors, but typically u of the system is varied to acquire images. The optical microscope is an example of such type of a system. Nonetheless, for the SFF systems the magnification of the imaging system should be constant and the depth of field should be as shallow as possible 8 . After image acquisition, the resulting image stack I is represented by l × m × n dimensions. Each (1)

Focus measurement and focus curve modeling
Focus measurement is the main concept in the SFF systems, and the sharpness criterion used to evaluate the level of focus in an image is called the focus measure operator (FM) 15 . The principle behind FM is to respond to the high-frequency content in the image, and ideally it should produce maximum response when the image is focused 32 . The main objective of FM in SFF systems is to provide a sharp focus curve (in parallel to the optical axis) for every object point in the image stack 33 . The FM is applied to each pixel of the image stack, as given in the following equation: where F is the FM transformation of pixel P (i,j) (k) that results in the focus value � (i,j) in the kth image. It represents the focus behavior or the focus curve of the pixel 33 .
In conventional methods, the initial depth map D (i,j) is obtained by maximizing the focus curve along the optical axis 9,[34][35][36][37] . The value of k is obtained using (4), where � (i,j) (k) is at the maximum according to: However, the obtain depth map can be further improved by some depth adjustment using the correction suggested by Shim 38 : where f denotes the focal length of the imaging device.   www.nature.com/scientificreports/ The focus behavior of every individual pixel, or in other words the focus curves, depends on various factors such as the type of FM used, the camera parameters, and most importantly, the image texture around that object point 9,13,39 . If the images are properly acquired, then all the resulting focus curves are bell-shaped 16 . The Gaussian model 15,29 , Lorentz-Cauchy model 9 , and Quadratic model 40 are used to approximate these bell-shaped focus curves 41 .
The Gaussian model 42 is given by: the Lorentz-Cauchy model is given by: and the Quadratic model is given by: where the As, Bs and Cs are the parameters of the respective model. The unification of these models in the Quadratic model is provided by 41 . If (6) is simplified after logarithmic transformation, it transforms to (8) as follows: Equation (7) can be transformed to (8) by applying a reciprocal transformation and some simplification, as follows: The transformation of the Gaussian and Lorentz-Cauchy models to the Quadratic model is shown in Fig. 3. In the next section, the Quadratic model as given by (8) is used to model the Jitter noise in SFF systems.

Motivation
Shape from focus has been under research for many years, but there exist some unresolved problems that impact the system performance. One of them is the instability in the sampling step size. The images are meant to be obtained at different focus levels, at a predetermined constant step size 9 . However, in practice, this step size cannot remain constant because of the mechanical aspects of the imaging device and lens-focusing methods. The instability in sampling step size is referred to as Jittering or Jitter noise. This noise alters the focus values of the image, by oscillating along the optical axis and propagates through the entire stack 33,43 . Since the same level of error is present in all the images, it is impossible to eliminate it through traditional de-noising techniques. Jang et al. 44 used the Kalman filter to remove this noise. They also proposed other methods involving Kalman filter 43,[45][46][47][48] . However, all these methods used Kalman filter with scalar models. The system matrix was considered as 1, hence ignoring the dynamic nature of the focus cues. All of their methods obtained multiple images for eliminating jitter at each step. It means if there were n images in the stack, 100 samples for each step were obtained, thus requiring n × 100 samples for each focus curve. However, similar results can be obtained by taking the mean of the measurement values on every step position k. Also, the large number of images increases the complexity of the system, resulting in high computational costs, impacting the practical use of their methods. Jitter noise is assumed to have Normal (symmetric) and Levy distributions (non-symmetric) with fixed parameters 44,47 . Later, a dynamic Kalman filter was used to address these issues proving that the resultant Jitter noise is dynamic in nature and is position-dependent 33 . Its probability density function changes on every step, and depends on many factors.
In this manuscript, the modeling of Jitter noise is presented in section "Focus measurement and focus curve modeling", and the recursive least squares (RLS) filter is designed to eliminate this noise from the SFF systems. In this method, a single new data point is analyzed in each algorithm iteration in order to improve the estimation of the model parameters. Since RLS converges faster, an RLS filter is designed to remove the Jitter noise from the focus curves. The Quadratic function is used to design the measurement model of focus curves, and an expression for shape recovery is followed. In the proposed scheme, a single measurement is taken at each step. Thus, as opposed to previously proposed methods, only n samples are required for each focus curve.
The manuscript organization is as follows. Section "Proposed method" discusses Jitter noise modeling and the proposed methodology of this noise remval using RLS. Section "Results and discussions" presents the results and discussions. Finally, Section "Conclusion" presents the conclusion of this study.

Proposed method
Jittering or Jitter noise is an error in signal amplitude, caused by the variation in sampling frequency while sampling a signal. In SFF systems, Jitter occurs when there is uncertainty or unevenness in the sampling step size of either u or f, depending on the variable factor to acquire the stack of images. This section discusses the step size in both situations of image acquisition, followed by modeling of Jitter noise which will be later utilized by the proposed model.

Conditions and assumptions for SFF.
There are four main conditions for image acquisition in Shape from Focus systems. First, the object is moved towards (or away) from the lens, ensuring that the whole object is first defocused then it is gradually focused (on every point) and then completely defocused again 34 . This condition ensures that the object only moves along the optical axis and there is no movement perpendicular to the optical axis. Second, as the object is moved, the magnification of the imaging system is assumed to be constant for the image areas that are perfectly focused 8 . Third, the body (object) is piece-wise constant, and there is no occlusion in the scene 36 . Last, the Depth of Field (DoF) of the imaging device in SFF systems must be finite and shallow (as much as possible) 9 . If any of these three conditions is violated then SFF algorithms can not be applied unless pre-processing of the image-stack is performed, e.g. image registration, image resize, etc.
The first condition ensures that complete bell-shaped focus curves are obtained. If this condition is violated, then obtaining complete bell-shaped focus curves becomes difficult or impossible. If the DoF is infinite or large, then the acquired images will have the same degree of focus, voilating the second condition, thus making it difficult to measure depth/shape using image focus. The shallow DoF also guarantees that the magnification for focused points in the image-stack, is not changed or the change is minimal.
Step size in SFF image acquisition. Observing all the necessary conditions of SFF mentioned above, i.e. the object is moved towards or away from the lens in constant small steps of u , the depth of field is as shallow as possible, and the focal length as well as magnification are kept constant; the step size expression u is provided by Muhammad and Choi 9 . The simplified expression for u (step size) can be given as: which is the maximum limit for u 9 .
As mentioned before, the images can also be obtained by changing the focal length of the system in small, constant increments of f 49 . In this case the focal length of the device is varied while the object is held static in front of the imaging device. This type of technique is mostly used by auto-focusing algorithms, for searching the best focal position for a single point. It can also be used for depth and shape estimation of the object under consideration 11 . Regardless of the image acquisition technique, an image is stored at every step to obtain a stack of images. where a, b and c are the function parameters, and f(k) is the Quadratic function. The sample points of this function are represented by 1 ≤ k ≤ n . The step size is k , and is 1. For modeling Jitter noise, the uncertainty in step size is considered as, ζ ∼ N(0, σ 2 ζ ) and thus (12) can be written as follows: By expanding the squared terms and simplifying using the Taylor series 50 , (13) can be written as: the following equation is obtained by using (12) and (14): It can be observed in (15) that the noise on the RHS of the equation is multiplied by the first and second derivatives of the function. Therefore, it can be concluded that the Jitter noise, in SFF systems, is dependent on the slope and concavity of the focus curves. If ζ is Normal ( N(0, σ 2 ζ ) ), then ( k + ζ ) follows N(k, σ 2 ζ ) , which given by: Rewriting (15): where ξ 1 and ξ 2 are given by: where ξ 1 is normally distributed, mean µ 1 = 0 and variance σ 2 1 = (2ak + b) 2 σ 2 ζ . Meanwhile, ξ 2 follows a Gamma ( Ŵ ) distribution, with mean µ 2 = aσ 2 ζ , and variance σ 2 2 = 2a 2 σ 4 ζ . The value of the variance σ 2 1 is different at every kth step, and becomes zero at: The direction of ξ 2 depends on the sign of a, and will always tend to the concavity of the function. The range ( R(ξ ) ) of Jitter noise is provided by (18): In order to obtain the pdf ( p(ξ ) ) of the Jitter noise, ξ 1 and ξ 2 can be combined as ξ = ξ 1 + ξ 2 , and p(ξ ) is given by (19), where the symbols r 1 and r 2 in (19) are computed using the following equation: Expression (19) holds only if ζ follows normal distribution. The Jitter noise ξ is the resulting noise in focus values due to uncertainty ζ in sampling step u.
Proposed Jitter removal methodology. The proposed scheme can be applied after the FM transformation is performed using (3). Pertaining to its faster convergence property, a recursive least squares (RLS) filter www.nature.com/scientificreports/ is designed to fully eliminate the Jitter noise from the focus curves. RLS is one of the most well-known adaptive filters that recursively calculate the coefficients of a given function, minimizing the cost function related to input measurements 51,52 .
Recursive least squares filter design. The measurements of focus curves (of every individual pixel) depend on various factors 9,13 . These focus curves are bell-shaped 16 and hence different models such as Gaussian model 15 , Lorentz-Cauchy model 9 , and Quadratic model 40 are used to approximate them. The advantage of the Quadratic model is, that it makes the focus profiles collected by different FMs to be modeled in a common form that is easy to handle and has computational advantages 41 . Thus, the measurement model of the proposed RLS scheme uses the Quadratic function given in (12), and can also be represented using (20): where a, b, and c are the unknown constant coefficients, and by measuring this function, a measurement model is obtained as: where y(k) denotes the measurements of focus curve after FM application at the kth image, H(k) is the measurement matrix at k, and ξ(k) is the measurement error representing Jitter noise with pdf given in (19). The unknown is computed on every step k as follows: www.nature.com/scientificreports/ where σ 2 ξ = Var(ξ ) is the variance of Jitter noise. The optimal least squares estimate of the parametric vector X and covariance �(k) are computed by the following recursive equations: In (23) the covariance �(k) can be redesigned to remove the matrix inversions using the Sherman-Morrison formula [53][54][55][56] , and can be rewritten as follows: The summary of equations for the proposed filter design is provided in Table 1. After the modified stack is obtained, the values of the focus curve ( � (i,j) (k) ) for each object point are taken and compared to a threshold value (T) (which is set heuristically to ignore tails of the focus curves). The values of � (i,j) (k) < T are ignored. Additionally, for values of � (i,j) (k) ≥ T , the value of index k is stored in κ s , followed by initialization of a new index parameter κ . The measurement value of focus for point (i, j) in consideration is taken as y(κ) = � (i,j) (k) . The next step calculates state measurement matrix ( H(κ) ), parametric vector ( X (κ) ), covariance matrix ( �(κ) ), and filter gain ( L(κ) ) using equations summarized in Table 1. After completion of all the iterations, the depth of the object point (i, j) is computed by modifying (25) as follows:

Results and discussions
The experimental results and discussions are presented in this section. The section is divided into three subsections. First, the experimental setup details are given, followed by the reconstructed-shape assessment criteria, and later a detailed analysis of the effects of Jitter noise and its removal using RLS filter for SFF is provided. .

Recursive least squares filter equations
Filter gain: L(k) A summary of the objects used in the 3D shape analysis is provided in Table 2. Ten simulated datasets of simulated cone are generated with different focus positions and Jitter noise levels using a camera simulation software (AVS). The details of AVS are provided in 13,37,57 . The MATLAB code used to generate the simulated object image set is downloaded from 13 . The AVS software is inputted with the depth map, texture image, and camera parameters. The texture map comprises concentric circles with alternating black and white stripes. The depth maps and the texture images used for image generation via AVS for all sequences, of Simulated Cone are similar. The difference in each dataset is the uncertainty ζ ∼ N(0, σ 2 ζ ) in step size u used to generate the sequences to study the effect of Jitter on shape reconstruction. The values of noise ζ with variance σ 2 ζ for u are 0, 0.1, 0.25, 0.5, 0.75, 1.0, 1.25, 1.5, 1.75, and 2.0, which result in Jitter noise ξ with variance σ 2 ξ = 2a 2 σ 4 ζ . The real datasets consist of real objects, Real Cone, Real Plane, LCD-TFT Filter, Groove, Coin, and Image-I. These image sequences were originally in grayscale. Figure 4 provides the ground truths of the Simulated and Real cones. Figure 5 shows the 10th frame of each image sequence. These image sequences have been widely used by many researchers including 11,24,40,[58][59][60] .
Images of Real Cone and Real Plane were taken using the CCD camera system 17 . The LCD-TFT filter images are microscopic images of an LCD color filter. The coin sequence consists of magnified images of Lincoln's head from the back of the US penny. The sequence of Image-I is the letter I engraved in a metallic surface. These images were obtained using the microscopic control system (MCS). This system comprised of a personal computer integrated with a frame grabber board (Matrox Meteor-II) and a CCD camera (SAMSUNG CAMERA SCC-341) mounted on a microscope (NIKON OPTIPHOT-100S). Computer software in MCS acquired the images by controlling the lens position through a stepper motor driver (MAC 5,000), possessing a 2.5nm minimum step length. Every image stored for each sequence at every step was captured by varying the object plane.   Metric measures. The quality of shape reconstruction can be measured by the difference between the reconstructed shape and the ideal shape, and it is set to be degraded when the difference is increased. This section analyzes the quality of the depth map obtained by using different FMs under various levels of Jitter noise. The quality of depth map is maximum when the obtained depth map is indistinguishable from the original map, and the difference is zero. In the previous literature, several quality metrics have been presented for accessing the quality of shape 61 . In this manuscript, RMSE and correlation are used to compare the proposed method combined with various FM operators under different levels of Jitter noise. A new depth assessment metric referred to as depth distortion (DD) is also proposed in this manuscript. Root Mean Square Error (RMSE) is the square root of the variance of the residuals of the data under observation. This indicates how close the perceived shape is to the original shape 62 , and is given by: where G true (i,j) is the ground truth, D obtained is the obtained depth map, and l × m are the dimensions of the depth maps. A higher value of RMSE indicates a larger error in the shape reconstruction. For better results, the value of RMSE should be closer to zero.
Correlation is a similarity measure between two shapes 62 , and is given by: where cov is the covariance, and σ 2 G and σ 2 D are the variances of G true and D obtained , respectively. Depth Distortion (DD) metric indicates the number of pixels (in percentage), which contribute to RMSE. It is calculated as follows: where, E G and E D are the edge images of the ground truth and the obtained depth map, respectively, and can be calculated by applying edge operators (e.g., Canny or Sobel operator). If the ground truth has any edge, then the number of pixels contributing to that edge should be similar to the obtained depth map; thus, the difference will be zero. But if the obtained depth map has some distortions in depth, then the edge image will have extra pixels from depth distortion; hence, it will contribute to RMSE. The proposed metric determines the quantity of these pixels.  Table 3, the proposed method is compared with previous methods provided. The Table 3. Comparison of proposed method with previous methods using SML, GLVA and TENG with Jitter www.nature.com/scientificreports/ www.nature.com/scientificreports/  Table 8. RMSE, Correlation (Cor.) and Depth Distortion (DD) in % for shape reconstruction using FM:SFRQ with different levels of variance of Jitter noise ( σ 2 ξ = (2ak + b) 2 σ 2 ζ + 2a 2 σ 4 ζ ), with and without proposed RLS filter.  Table 9. RMSE, Correlation (Cor.) and Depth Distortion (DD) in % for shape reconstruction using FM:TENG with different levels of variance of Jitter noise ( σ 2 ξ = (2ak + b) 2 σ 2 ζ + 2a 2 σ 4 ζ ), with and without proposed RLS filter. Along with the above mentioned methods, the proposed RLS method is also compared with the dynamic Kalman filter as proposed by Mutahira et al. 33 , which also utilizes just one iteration per step. However, since the system model in their method is modeled using Taylor's series, it first estimates the derivatives of the function ( f ′ (k), f ′′ (k)&f ′′′ (k) ) using a single measurement of f(k) and then using these estimates (of the derivatives), the parameters of the cubic equation (a, b, c, and d) are estimated. Later the depth is recovered using the indirect estimated values of the parameters. This indirect estimation introduces lags and errors in the final depth map. The proposed scheme presented in this manuscript is based on estimating the parameters of the quadratic equation (a, b, and c) directly from the measurement. Thus, depth estimation using proposed scheme produces significantly lower error contributing to better RMSE, DD and correlation values, as can be seen in Table 3.  Tables 4, 5    www.nature.com/scientificreports/ average of metrics when FM is used only for Jitter noise with variance σ 2 ξ , where 0 ≤ σ 2 ζ ≤ 2 . The solid bars in the graphs represent the average of metrics when FM is used with the proposed RLS technique. It can be seen that SML has shown the best results when used with the proposed RLS method. The tables and figures clearly show that when the proposed technique is used for all levels of Jitter noise the shape reconstruction results improve as compared to when using FM only. Figure 9 shows the shape reconstruction of the simulated cone, when only FMs are used, and Fig. 10 represents the reconstructed shape of simulated cone when FMs are used with the proposed RLS scheme, for a noise range of σ 2 ζ : 0, 0.1, 0.5, 1.0, 1.5, 2.0 , respectively, which results in Jitter noise with σ 2 ξ . These results clearly demonstrate the effectiveness of the proposed scheme. The shape reconstruction, when only FMs are used, have rough surface when the noise is increased from 0 to 2.0; whereas, for the same noise range, the proposed scheme has shown promising results. Table 10 shows the results of Real Cone shape reconstruction for different levels of noise. These results show the similar trends as the simulated cone. When only FMs are used, the shape reconstruction deteriorates significantly with an increase in noise, whereas when these FMs are combined with the proposed RLS technique, the noise can be filtered out resulting in better shape reconstruction. Figure 11 represents the reconstructed shape of Real Cone using different FMs only and FMs with the proposed RLS technique. When the noise increases, the surface roughness is clearly visible in the figure when only FMs are used, whereas when RLS is combined   When using FM only, the roughness in shape was because of jittering, which is smoothed in the reconstructed shape by the application of the proposed method. The cylindrical shape of the filter is preserved in LCD-TFT filter, and the surrounding surface is smoothed by the filtering process. Jitter in the sequence of the data-set of Image-I is quite low, thus not much difference was observed visually. However, near the vertical axis of 175 value, a depth abnormality can be observed in shape reconstruction when using all the FMs, and also with the proposed scheme in the case of Coin sequence. The sides and center of groove are over-exposed, in the Groove image sequence. This causes texture degradation, which is critical in SFF systems. The change in focus levels is exhibited only by the slopes in the middle. Figure 16 shows the shape reconstruction results of the Groove using different FMs and FMs with the proposed filter application.

Conclusion
Shape from focus is one of the passive techniques used to recover 3D shapes of the objects from their respective 2D images, by using the focus information present in the scene. SFF requires a huge number of images obtained at different focus levels, and the sampling step size for this image acquisition is taken as a constant. However, this inter-frame distance is susceptible to errors due to mechanical inaccuracies, such as flaws in gear assembly of the translational stage or lens assembly of the focusing device. These errors are termed as Jitter noise. Jitter  www.nature.com/scientificreports/ noise is not visible in images, because each pixel in an image is subjected to the same error in focus, and thus traditional techniques of image denoising do not work in this case. In this paper, Jitter noise is modeled using the Quadratic function for focus curves, and a recursive least squares filter is designed to remove this noise from SFF systems. In RLS, a single new data point is analyzed in each algorithm iteration to improve the estimation of the model parameters. Since it converges faster, an RLS filter is designed for shape from focus systems to remove the Jitter noise from the focus curves. To check the robustness of the results the proposed manuscript employs the RMSE and correlation and proposes a new metric referred to as the Depth Distortion metric, which indicates the number of pixels in percentage contributing to the RMSE. Seven objects are used for experiments herein: one simulated and six real. Ten noise levels are tested on the simulated object and four levels on the real objects. The experimental results validate the effectiveness of the proposed scheme.

Data availability
The datasets used and analyzed during the current study is available from the corresponding author on reasonable request.